Predicting Students’ Chance of Admission Using Beta Regression

Marcus Chery and Keiron Green

Invalid Date

Introduction

The formal introduction of beta regression is attributed to Ferrari and Cribari-Neto in their 2004 paper, “Beta Regression for Modelling Rates and Proportions”.

  • Definition: A statistical technique for modeling data that follows a beta distribution.
  • Key Characteristics:
    • Deals with continuous variables bounded between 0 and 1.
    • Suitable for rates or proportions.
  • Mechanics:
    • Models the effect of predictors on the mean of a beta-distributed response variable.
    • Incorporates a link function, similar to logistic regression.
    • Estimation typically done through maximum likelihood methods.
  • Case Study: This study focuses on a crucial aspect of academic admissions: understanding the factors influencing the chance of admission into academic programs.

    Our analysis revolves around key variables: GRE scores, TOEFL scores, letters of recommendation (LOR), undergraduate CGPA, and research experience

  • To identify and quantify the impact of various academic and experiential factors on the chances of admission.
  • To employ Beta Regression, an advanced statistical method suitable for modeling percentages or proportions, like the chance of admission.
  • Our goal is not just to find which factors are significant, but also to understand the extent of their influence.

By Pabloparsil — Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=89335966

Methods

  • Link appropriateness (“deviance residuals vs. indices of observation”, at least for the logit link)

  • Models continuous random variables and assumes values are in (0, 1), such as rates, proportions.

  • Dependent variable is beta-distributed.

  • Missing and outliers should be addressed to allow for model fitting.

Beta regression model function:

Link function: g(μ)=log⁡(μ∕(1-μ))

The beta density formula:

Data

  • This dataset was built with the purpose of helping students in shortlisting universities with their profiles. The predicted output gives them a fair idea about their chances for a particular university.

  • This dataset includes various information like GRE score, TOEFL score, university rating, SOP (Statement of Purpose), LOR (Letter of Recommendation), CGPA, research and chance of admit.

Parameter Range Description
gre_score

290 - 340

(340 scale)

Quantifies a candidate’s performance on the Graduate Record Examination, with a maximum score of 340
toefl_score

92 - 120

(120 scale)

Measures English language proficiency, scored out of a total of 120 points
un iv ersity_rating 1 to 5 with 5 being the highest rating Rates universities on a scale from 1 to 5, indicating their overall quality and reputation.
sop 1 to 5 with 5 being the highest rating E valuates the strength and quality of a candidate’s SOP on a scale of 1 to 5
lor 1 to 5 with 5 being the highest rating E valuates the strength and quality of a candidate’s SOP and LOR on a scale of 1 to 5
cgpa

6.8 - 9.92

(10.0 scale)

Reflects a student’s academic performance in their undergraduate studies, scored on a 10-point scale
research 0 or 1 Indicates whether a candidate has research experience (1) or not (0).
chance_ of_admit

0.34 - 0.97

(0 to 1 scale)

Represents the likelihood of a student being admitted, expressed as a decimal between 0 and 1

Descriptive Statistics:

  • Average GRE Score: 316.81
  • Average TOEFL Score: 107.41
  • Average CGPA: 8.60

Outliers:

  • Outliers observed in upper ranges of CGPA and Chance of Admit.

Multicollinearity:

  • Moderate multicollinearity.
  • Highest VIFs: CGPA (5.21), GRE Score (4.62), TOEFL Score (4.29).

Correlation Analysis:

  • Strong correlations: GRE & TOEFL (0.84), GRE & CGPA (0.83), TOEFL & CGPA (0.83).
  • High correlation of Chance of Admit with CGPA (0.87), GRE Score (0.80), TOEFL Score (0.79).

Analysis and Results

  • Our chosen model is specified as: chance_of_admit ~ gre_score + toefl_score + lor + cgpa + research.”

  • The model fitting involved iteratively refining our estimates to maximize the likelihood of observing our data given the model.

  • Residuals are scattered around the zero line, which indicates that there are no obvious patterns such as curvature or increasing or decreasing spread.
  • Most residuals fall within the range of -2 to 2, which suggests that the model does not exhibit any significant outliers or influential observations.
  • The randomness in the distribution of residuals implies that our model meets the assumption of homoscedasticity and that the error variance is constant.

Conclusion

  • The beta regression model identified significant predictors of the chance of admission: GRE Score, TOEFL Score, Letters of Recommendation, CGPA, and Research Experience.

  • CGPA emerged as the strongest predictor, suggesting that academic performance in undergraduate studies is highly indicative of admission chances.

  • Research Experience also has a notable positive influence, underscoring the value of scholarly work in the admission process.

  • The study reinforces the importance of a well-rounded application, beyond test scores, highlighting the integral role of holistic review.

  • The model assumes a linear relationship between predictors and the logit of the admission chances, which may not capture all nuances.

  • The analysis is based on a specific dataset, which may limit the generalizability of the findings to other contexts or institutions.

  • Future research could explore the inclusion of interaction terms, non-linear relationships.

References

Ferrari, Silvia, and Francisco Cribari-Neto. 2004. "Beta Regression for Modelling Rates and Proportions." Journal of Applied Statistics 31 (7): 799–815. https://EconPapers.repec.org/RePEc:taf:japsta:v:31:y:2004:i:7:p:799-815.